The Plant Genome — Latest Matching Preprints

1

Genomic prediction of single cross families of perennial ryegrass in two nitrogen managements

Santos Junior, D. R. d.; Fe, D.; Lenk, I.; Jensen, C. S.; Asp, T.; Janss, L.; Bornhofen, E.

2026-05-08 genomics 10.64898/2026.05.05.722839 medRxiv

Top 0.1%

38.4%

Show abstract

The performance of a single cross is determined by the average additive effects of the parents, as well as the interactions between them. These quantities can be estimated using an appropriate genetic design, allowing for the estimation of general (GCA) and specific (SCA) combining abilities. The prediction of GCA for new parents and the total genetic value of unrealized crosses can be made when genome-wide marker information is available. Several studies in crops such as maize and rice have demonstrated the potential of genomic-assisted prediction of single-cross performance in economically important crops. However, no study to date has explored its relevance in perennial ryegrass, an obligate allogamous species that is bred in genetically heterogeneous families. In this study, we aimed to estimate genetic parameters and assess the ability of genomic models to predict the performance of F2 families in terms of dry matter yield and nutritive quality traits. We used data from a large partial diallel involving 104 parents from two distinct subpopulations, as inferred by admixture analysis. F2 families were evaluated in multiple environments and under two nitrogen availability conditions. Genotyping-by-sequencing of the parent plants produced 42,145 variants after quality control, which were used to estimate genomic relationships based on identity-by-state. Variance component estimation revealed limited GCA and SCA interactions with the environment, and particularly with nitrogen management. The predictive abilities of two parental models exceeded 0.60 and often surpassed 0.70 for most traits. However, incorporating non-additive effects into the model did not improve predictive ability. We leveraged the genetic diversity among parents to map genomic regions associated with all recorded traits. Genome-wide association studies (GWAS) by genomic best linear unbiased prediction (GBLUP) identified six quantitative trait loci (QTL) regions, with 45 candidate genes within the linkage disequilibrium range, estimated at approximately 92 kb. Our results demonstrate that genomic prediction of single crosses can be performed with high accuracy, especially when both parents are also progenitors of families in the training set.

2

Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv

Top 0.1%

34.2%

Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

3

Phenotyping replication is a major determinant of genomic predictive ability in sweet sorghum (Sorghum bicolor Moench)

CHARLES, J. R.; Rice, B.; Tovignan, T.; Morris, G. P.; Pressoir, G.

2026-06-19 genomics 10.64898/2026.06.15.731123 medRxiv

Top 0.1%

34.0%

Show abstract

Genomic selection can increase the rate of genetic gain in crop breeding programs, but its effectiveness depends on the reliability of phenotypic data, the size and composition of the training population (TP), and the statistical model used to estimate genomic breeding values. These design choices are especially important in resource-limited breeding programs, where additional replication, larger TPs, and more extensive genotyping compete for the same resources. Using empirical data from a sweet sorghum [Sorghum bicolor (L.) Moench] breeding population, developed by CHIBAS, we evaluated the effects of phenotyping replication, TP size, training-validation genomic relatedness, and genomic prediction (GP) model on predictive ability (PA). Grain yield, plant height, stem weight, and total soluble solids were evaluated across three field environments. Few studies in sorghum have examined these factors together with comparable empirical rigor. Increasing replication improved genomic heritability and PA for all traits and environments, with the largest gains observed for grain yield. Larger TPs and increased training-validation genomic relatedness also improved PA, but their effects were most significant when phenotype estimates were based on multiple replicates. GP models showed largely comparable PAs across all evaluated traits. Different models produced similar PA, with a few exceptions. These findings provide practical guidance for optimizing genomic selection in resource-limited sorghum breeding programs. ARTICLE SUMMARYGenomic selection can accelerate breeding only when the phenotypes used to train prediction models have high reliability. Using a sweet sorghum breeding population evaluated in three Haitian field environments, we quantified how replication number, training population size, training-validation genomic relatedness, and prediction model affected genomic predictive ability for grain yield, plant height, stem weight, and total soluble solids. Replication increased genomic heritability and predictive ability for all traits, with the strongest effects for grain yield. Larger and more connected training populations improved prediction, mainly when replication was adequate. These results provide practical guidance for resource-limited breeding programs. Core ideasO_LIIn this empirical sweet sorghum breeding population, phenotyping replication was the dominant factor explaining variation in genomic predictive ability across traits and environments. C_LIO_LIThe benefit of larger training populations and greater training-validation genomic relatedness increased when phenotype estimates were based on more replicates. C_LIO_LIGrain yield, the most environmentally sensitive trait evaluated, showed the largest response to improved replication and training-population design. C_LIO_LIBayesian models, rrBLUP, and GBLUP showed similar predictive abilities across traits and environments, suggesting that phenotyping and experimental design may be more important than model complexity. C_LI

4

Genome-Wide Markers Predict Metribuzin Tolerance in Southern Soft Red Winter Wheat

Sellani, J.; Anzueto, H.; Arcenaux, K.; Price, P. T.; Brown-Guedira, G.; Harrison, S.; DeWitt, N.

2026-07-03 genomics 10.64898/2026.06.28.733875 medRxiv

Top 0.1%

33.9%

Show abstract

Metribuzin is a versatile herbicide effective against various annual grasses and broadleaf weeds found in wheat fields. However, it can cause foliar damage to wheat, impacting plant health and yield. A clearer understanding of the genetic architecture associated with metribuzin tolerance is necessary to guide marker-based breeding strategies. This study evaluated 351 historic Gulf Atlantic Wheat Nursery (GAWN) wheat breeding lines representative of southern US soft red winter wheat (SRWW) germplasm. Field trials were conducted at Winnsboro (WN) and Baton Rouge (BR), Louisiana, in 2016 and 2017. Metribuzin was applied at specific growth stages[DN1.1], and tolerance was assessed based on visual foliar damage. Genomic data from 6,252 filtered single nucleotide polymorphism (SNP) markers were used to estimate narrow-sense heritability, conduct genome-wide association (GWAS), and assess genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP). Broad-sense heritability ranged from 0.54 to 0.69 within environments and reached 0.77 across environments, while narrow-sense heritability ranged from 0.35 to 0.47, indicating moderate additive genetic control. No SNP surpassed the significance threshold, but genomic prediction (GP) showed moderate to strong predictive ability (PA) across environments, with the highest accuracy (r = 0.62) observed between BR17 and WN17. These results indicate that metribuzin tolerance in SRWW is primarily controlled by multiple small-effect loci and that GS provides a more effective breeding strategy than marker-assisted selection for improving tolerance in southern wheat germplasm.

5

Comparison of localGEBV and Optimal Haplotype Stacking Fitness Functions using a Novel R Package: HapSelect

Shaffer, W.; Papin, V.; Carter, Z.; Brunner, S. M.; Tong, J.; Villiers, K.; Robinson, H.; Voss-Fels, K.; Hayes, B. J.; Hickey, L.; Dinglasan, E.

2026-07-13 genetics 10.64898/2026.07.08.737160 medRxiv

Top 0.1%

30.7%

Show abstract

Haplotype-based breeding strategies have emerged as promising approaches to maximize long-term genetic gain by identifying complementary parental combinations while maintaining genetic diversity. However, these methods typically require phased genotypes and more intensive workflow pipelines and skillsets. We developed a novel local genomic estimated breeding value (localGEBV) fitness function with similar intent to the optimal haplotype stacking (OHS) framework fitness function and implemented both in the novel R package, HapSelect. Our aim was to evaluate whether phased haplotypes provide additional benefit over the more easily available dosage-based unphased genotypes in highly inbred crops. A subset of bread wheat nested association mapping (NAM) population comprising 444 lines genotyped with 6,054 DArT-Seq markers was analysed. Marker effects were estimated using rrBLUP, localGEBV and haplotype effects were calculated across linkage disequilibrium-defined haploblocks, and genetic algorithms (GA) were used to identify optimal sets of 30 founders using either a localGEBV derived fitness function with unphased, dosage inputs or the OHS fitness function with phased inputs. Selected parental sets were compared with conventional truncation selection (TS) through 150 generations of forward simulation. The OHS fitness function achieved a marginally greater optimized ultimate GEBV than the localGEBV fitness function during GA optimization, with only 18 of the 30 selected founders overlapped between the two methods. Despite these differences, forward simulations demonstrated nearly identical long-term genetic gain for localGEBV and OHS-selected founders, with both approaches outperforming conventional truncation selection by maintaining greater genetic diversity and delaying the genetic plateau. The minimal difference between localGEBV and OHS is likely attributable to the high homozygosity of the population, where localGEBV and haplotype effects are nearly confounded. These results demonstrate that dosage-based localGEBV provides a practical alternative to phased haplotype approaches for parent selection in inbred crops, substantially simplifying genomic workflows while maintaining long-term breeding performance. Future work should evaluate these methods in more diverse inbred populations and outbred species, where great haplotypic diversity may increase the advantage of true haplotype-based optimizations.

6

Efficient genomic prediction at reduced training size and moderate marker density in an expanded aus-NAM population of rice

Kitony, J. K.; Reyes, V. P.; Sunohara, H.; Tasaki, M.; Yamasaki, M.; Mori, J.-i.; Shimazu, A.; Nishiuchi, S.; Michael, T. P.; Doi, K.

2026-05-01 plant biology 10.64898/2026.04.28.721500 medRxiv

Top 0.1%

30.2%

Show abstract

Genomic selection (GS) can accelerate genetic gain in crops, but its effectiveness depends on training population design and marker density. Nested association mapping (NAM) populations provide a structured framework that captures broad allelic diversity within a controlled genetic background. Here, we evaluated genomic prediction (GP) and genome-wide association study (GWAS) performance in an expanded aus-NAM population of rice comprising 1,818 recombinant inbred lines across 14 families and 11 agronomic traits, using genotyping-by-sequencing (GBS) markers and projected whole-genome sequence variants. Prediction accuracy plateaued at moderate marker densities ([~]20k SNPs) and with training populations of [~]500 lines ([~]40-60% of the available pool), with trait heritability emerging as the strongest determinant of predictive performance rather than model choice or marker density. In contrast, GWAS resolution continued to improve with increasing marker density, enabling detection of additional loci, including a chromosome 12 locus associated with heading date, while consistently recovering well-characterized genes such as EARLY HEADING DATE 1 (Ehd1) and SEMIDWARF 1 (SD1). These contrasting patterns indicate that GP reaches near-optimal performance once genome-wide variation is adequately represented, whereas GWAS benefits from higher marker density through improved locus resolution. The present study establishes a benchmark for implementing breeding programs involving japonica/indica crosses using GP in a single environment.

7

Leveraging genome-wide association studies and genomic prediction for distinctness, uniformity, and stability (DUS) testing in maize

Daware, A. v.; Hacke, C.; Remay, A.; Starnberger, P.; Schraml, C.; Collonnier, C.; Laurens, F.; Schmid, K. J.

2026-06-12 genetics 10.64898/2026.06.10.731330 medRxiv

Top 0.1%

22.2%

Show abstract

Testing for distinctness, uniformity, and stability (DUS) is a requirement for plant variety registration and based on phenotypic traits, which is time-consuming and sensitive to environmental variation. Advances in genomics allow to complement DUS testing with molecular markers, for which two models in DUS testing were proposed by the Union for the Protection of New Varieties of Plants (UPOV). A use cases was described for maize, but an implementation has been hindered by a lack of suitable markers and validated analytical frameworks. We address these challenges by integrating historical DUS characteristics scores from 352 European hybrid maize varieties with high-density genome-wide single nucleotide polymorphism (SNP) data. Using genome-wide association studies (GWAS), we identified 18 genomic regions and candidate genes associated with 12 DUS characteristics, enabling the development of diagnostic markers consistent with the UPOV model "Characteristic-Specific Molecular Markers". Since most DUS traits are polygenic, we combined GWAS-informed marker selection with XG-Boost-based machine learning to predict notes of DUS characteristics. This approach achieved strong predictive performance across multiple traits (mean accuracy 0.67), demonstrating its potential for managing reference collections under UPOV model "Combining phenotypic and molecular distances in the management of variety collections". Both approaches were validated for two characteristics using independent public USDA-NPGS maize datasets (>1,700 accessions) highlighting the value of public data for method validation. We also identify key limitations of historical DUS data, including imbalanced and sparse trait representation, and discuss mitigation strategies. Despite these constraints, our results demonstrate that molecular markers may improve maize DUS testing, enabling faster, more accurate variety registration and supporting accelerated crop improvement. Key messageHistorical DUS datasets can be used to identify marker-trait associations of DUS characteristics using genome-wide association study (GWAS) and to develop a genomic prediction framework for an accurate prediction of DUS character notes from marker data.

8

Genotypic and multi-environment phenotypic evaluation of the lima bean USDA National Plant Germplasm System collection

Adaskaveg, J. A.; Hershberger, J.; Farmer, A. A.; Penmetsa, R. V.; Garcia-Lopez, I.; Garcia-Abadillo, J.; Zhou, X.; Huynh, B.-L.; Roberts, P.; Ernest, E. G.; Warburton, M. L.; Jarquin, D.; Dohle, S.; Palkovic, A.; Parker, T. A.; Gepts, P.; Diepenbrock, C. H.

2026-06-06 plant biology 10.64898/2026.06.03.729973 medRxiv

Top 0.1%

21.9%

Show abstract

Lima bean (Phaseolus lunatus L.) is an economically and agronomically important grain legume. Lima beans (or limas) show a range of climatic adaptations with independent domestications in the Andes (large-seeded) and Mesoamerica (small- or medium-seeded). We generated and integrated genotypic and comprehensive field- and laboratory-based phenotypic information for the available accessions in the USDA National Plant Germplasm System collection across multiple environments to inform germplasm utilization in breeding. A total of 810 accessions were genotyped using short-read, low-coverage sequencing. Accession geographic origin and domestication explained population structure. A partially overlapping subset of the panel (n=141-308) was field-evaluated across two years in each of Davis, CA, Central Ferry, WA, and Coachella Valley, CA (the latter was fall-planted for evaluation of photoperiod-sensitive accessions) to assess trait performance in contrasting environments. Agronomic traits such as determinacy and flowering time, and seed traits such as seed coat color and hundred-seed weight, were scored. Macronutrient traits (protein, starch, fat, and ash content) were measured on dry (mature) harvested grain via near-infrared spectroscopy. Genome-wide association analyses identified loci significantly associated with descriptive, agronomic, and seed traits, including orthologs of known genes in common bean and novel candidate regions. Genomic predictive abilities were moderate to high for key traits. Finally, we established a conditional core collection that was constrained to include 211 extensively phenotyped accessions and for which 91 supplemental accessions were selected to maximize genetic diversity from among the genotyped accessions. Overall, these resources provide a foundation to support genomics-assisted breeding of limas.

9

Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv

Top 0.1%

18.6%

Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI

10

Knowledge-guided Bayesian optimization using pre-trained LLMs speeds up the identification of superior genotypes from germplasm collection

Hamazaki, K.; Tsuda, K.

2026-07-02 bioinformatics 10.64898/2026.06.28.735149 medRxiv

Top 0.1%

18.6%

Show abstract

Background: Germplasm collections contain wide genetic diversity that is valuable for plant breeding, but conducting phenotypic evaluation for all genotypes in field trials is rarely feasible. Bayesian optimization offers a way to decide, season by season, which genotypes to cultivate in order to identify superior genotypes with fewer evaluations. However, standard Bayesian optimization commonly starts from randomly selected genotypes and mainly relies on surrogate models built from marker genotype information, while the text-based passport information that accompanies germplasm is not fully used. We examined whether pre-trained large language models can provide prior knowledge that improves these decisions in germplasm evaluation. Results: We constructed a large-language-model-guided Bayesian optimization framework that introduces large language models into two parts of the Bayesian optimization workflow. In zero-shot warmstarting, a large language model proposes initial genotypes using passport information such as cultivar name, country of origin, and subpopulation, optionally together with principal component scores derived from genome-wide single-nucleotide-polymorphism markers. In addition, we evaluated a large-language-model-based surrogate model that predicts phenotypic values for untested genotypes using in-context learning from previously evaluated genotypes. Using a rice germplasm panel and two target traits (seed number per panicle for maximization and protein content for minimization), we compared strategies. For seed number per panicle, zero-shot warmstarting with a general-purpose instruction-following model reduced the number of evaluated genotypes needed to reach the best genotype, whereas improvements were small for protein content. When genomic information was available, Gaussian-process-based Bayesian optimization was the strongest overall approach, while the large-language-model-based surrogate model outperformed random baselines and was competitive in some settings. When genomic information was not available, predictions based on passport information improved efficiency compared with fully random strategies. Conclusions: Pre-trained large language models can inject useful agronomic knowledge into Bayesian optimization for germplasm evaluation, particularly by improving early-stage genotype selection, and can also support optimization when genomic information is unavailable. As models better handle long genomic sequences together with passport information, large-language-model-guided Bayesian optimization may become a practical and explainable decision-support approach for agricultural optimization.

11

Multi-Trait Meta-QTL Analysis Reveals Genomic Hotspot Classes for Strategic Maize Improvement

Parthasarathy, S.; Rocheford, T.; Koehler, K.

2026-06-10 plant biology 10.64898/2026.06.06.730627 medRxiv

Top 0.1%

18.1%

Show abstract

BackgroundDecades of maize (Zea mays L.) QTL mapping have produced fragmented results across hundreds of independent studies, characterized by broad confidence intervals, population-specific effects, and a predominantly single-trait analytical scope. Comprehensive multi-trait integration remains limited, yet it could substantially improve our understanding of trait relationships for strategic breeding. We integrated 2,701 QTLs published over 30 years across five functionally distinct trait categories (grain yield and components; plant development and architecture; plant physiology and stress adaptation; grain quality and nutritional composition; and disease and pest resistance) in order to identify functionally classified genomic hotspots and prioritize candidate genes for multi-trait breeding applications. ResultsBioMercator V4.2 consolidated 2,518 projectable QTLs into 187 high-confidence meta-QTLs (MQTLs), achieving an average 59% reduction in confidence interval width; 128 of 187 MQTLs (68.4%) achieved dual-platform support through GWAS co-localization. Twenty-three genomic hotspots harbored 132 of 187 MQTLs (70.6%) and were classified into three functional categories: twelve multi-trait hubs that may enable simultaneous improvement of multiple traits through pleiotropic or tightly linked genes; seven single-trait clusters with pathway-specific effects, exemplified by the chromosome 9 starch biosynthesis cluster; and four major-effect loci with reported individual effects exceeding 20% PVE, including vgt1 (54% PVE) and opaque2 (34.2% PVE). Descriptive environmental classification distinguished MQTLs predominantly supported by optimal-condition QTLs (42%) from those predominantly supported by stress-condition QTLs (28%), the latter showing approximately 3.5-fold greater mean contributing-QTL phenotypic variance, directionally consistent with conditional genetic effect amplification under stress. Network-based candidate gene prioritization combined with cross-cereal ortholog analysis showed that 67% of the top candidates possess orthologs in rice, sorghum, wheat, or barley, and 53% are conserved across all four species, identifying priority targets for functional genomics investment. ConclusionsThis functionally classified and environmentally characterized meta-QTL framework provides breeders with a structured resource for multi-trait hotspot selection, environment-appropriate allele deployment, and functional genomics prioritization, with broader applicability as a transferable analytical template for other crop species confronting analogous challenges of fragmented QTL literature and complex multi-trait breeding objectives.

12

Genetic Dissection of Grain Yield and Correlated Proxy Traits Under Suboptimal Conditions

Lin, Y.-C.; Urbany, C.; Shlykova, A.; Hoelker, A.; Ouzunova, M.; Prester, T.; Pook, T.; Mayer, M.; Urzinger, S.; Schoen, C. C.

2026-04-24 genetics 10.64898/2026.04.22.720082 medRxiv

Top 0.1%

15.0%

Show abstract

Securing sustainable crop production requires the genetic improvement of abiotic stress tolerance. Due to the broad range of environmental factors causing abiotic stress and complex genotype-by-environment interactions, it is crucial to understand the genetic basis of crop yield under suboptimal conditions. Here, we developed a dent maize Multi-parent Advanced Generation Inter-Cross (MAGIC) population comprising 388 doubled haploid (DH) lines. The population was derived from eight founders with varying stress tolerance, selected from a dent diversity panel evaluated for yield performance across a wide range of European environments. The MAGIC DH lines were genotyped via whole-genome sequencing ([~]5X coverage) and evaluated in seven testcross and 14 line per se trials, for grain dry matter yield, leaf senescence, leaf rolling, anthesis-silking interval, and six additional agronomic traits. Genetic dissection identified 22 grain yield QTL, explaining 45% of the genetic variance. Under heat and drought stress, testcross grain yield correlated significantly with leaf senescence and leaf rolling measured in line per se trials. Bivariate multi-trait analysis showed that alleles for delayed senescence and reduced rolling at detected QTL generally exhibited positive effects on grain yield, suggesting that accumulating these favorable alleles could enhance yield performance. Incorporating these proxies into multi-trait genomic prediction models improved yield prediction accuracy, although gains were constrained by modest trait correlations. Given the comprehensive data, we also provide recommendations for optimizing sequencing depth and QTL mapping strategies in experimental maize populations. Key messageThis eight-founder MAGIC population represents a powerful resource for dissecting complex traits in maize, assessing the utility of drought proxy traits, and optimizing low-coverage whole-genome sequencing approaches.

13

Characterization of genetically effective cells and EMS mutagenesis on the novel winter oil seed Pennycress (Thlaspi arvense)

Brusa, A.; Branch, C.; Sulivan, L.; Chopra, R.; Rai, K.; Rockstad, G.; Gjesvold, E. S.; Ott, M.; Jain, S.; Biel, C. C.; Marks, M. D.

2026-05-05 genomics 10.64898/2026.04.30.722012 medRxiv

Top 0.1%

14.7%

Show abstract

Pennycress (Thlaspi arvense L.) is an intermediate winter oilseed crop that has only recently been domesticated for agronomic use. Improving agronomic traits requires sources of genetic variation, and mutagenesis is frequently used to help overcome the limitations of natural populations. We investigate the impact of Ethyl methanesulfonate (EMS) on genetically effective cells (GECs) to characterize the intra-individual genetic variation of EMS mutagenesis in pennycress. We identified that pennycress contains at least 4 GECs which, when treated with EMS, create unique mutations across different branches within the same individual plant. We then propagated the M2 plants for whole genome sequencing, providing extensive characterization of the EMS mutation profile and developing a gene index as a resource for future reverse genetic screenings. Article SummaryPennycress is an emerging winter oil seed crop in the American Midwest. Domestication efforts have advanced rapidly through a combination of genetic techniques. One of the most successful methods has been the use of a mutant gene index, a large collection of pennycress seed where new genetic variation has been created through Ethyl methanesulfonate (EMS). EMS mutations are not uniform however, and a single treated seed can have wide genetic variation within the resulting plant. We investigate the role of genetically effective cells on EMS variation, and present the full EMS population as a resource for further pennycress domestication efforts.

14

A novel matrix multiplication framework for modeling genotype-by-environment interaction in genomic prediction

Montesinos-Lopez, O. A.; Montesinos-Lopez, A.; Montesinos-Lopez, J. C.; Crossa, J.; Dreisigacker, S.; Hernandez-Suarez, C. M.; Ortiz, R.

2026-05-15 genetics 10.64898/2026.05.11.724414 medRxiv

Top 0.1%

13.1%

Show abstract

Accurate modeling of genotype-by-environment (GxE) interaction is critical for genomic prediction in plant breeding but remains challenging due to complex interaction structures. Conventional models often use the Hadamard product of genotype and environment covariance matrices to capture joint similarity, which may not fully represent GxE complexity. Here we propose a novel framework that derives covariance structures from the matrix multiplication of genotype and environment kernels, decomposing these into symmetric components incorporated as random effects in mixed models. Evaluated for 11 wheat and rice multi-environment datasets and across, this approach consistently outperformed the traditional Hadamard-based model, improving prediction accuracy by up to 13.2% in Pearsons correlation and enhancing top-selection accuracy. Combining both methods yielded the highest performance, indicating complementary information capture. This framework offers a flexible, interpretable, and computationally feasible extension for modeling GxE interaction, potentially enhancing genomic selection effectiveness under diverse environmental conditions.

15

Haplotypes variations of yellow stripe like (TaYSL) genes are associated with grain iron and zinc contents in wheat (Triticum aestivum L.)

Abbasi, K.; Qayyum, H.; Naseer, S.; Sun, M.; Quraishi, M. A.; Danyal, Y.; Hao, Y.; He, Z.; Rasheed, A.

2026-07-08 plant biology 10.64898/2026.06.17.732851 medRxiv

Top 0.1%

13.0%

Show abstract

The availability of pangenome and resequencing of wheat collections have facilitated the discovery of gene-trait associations in wheat. Yellow stripe-like (YSL) proteins play a key role in the uptake and translocation of metals and yet have not been fully identified and analyzed at the genome-wide level in wheat. In this study, 26 TaYSL genes were identified and divided into four distinct clades, each clade sharing similar domains and motif compositions. Most genes were upregulated under iron deficiency, whereas homoeologs of TaYSL1 were downregulated. Both SNP-based and haplotype-based association studies were used to dissect the role of TaYSLs underpinning grain iron contents (GFeC) and zinc contents (GZnC) in wheat. TaYSL6-2B and TaYSL16-1A haplotypes showed strong association with GFeC, and TaYSL14-6A showed strong association with GZnC in multiple field trials. The distribution of favorable haplotypes in global wheat collection of [~]3000 accessions showed that majority of haplotypes were more prevalent in landraces and winter wheat compared to modern cultivars and spring types, indicating their potential for use in breeding. The combination of favorable haplotypes of three YSL genes associated with GFeC and GZnC were very rare, and most of the wheat accessions has single or double favorable haplotypes. These findings provide the first comprehensive characterization of the TaYSL gene family in wheat and identify significant SNPs and elite haplotypes that can be utilized for genetic improvement and biofortification.

16

From Phenomics to Genomics: Macro-GWAS of Almond Morphology and Quality

Mas Gomez, J.; Rubio Angulo, M.; Duval, H.; Dicenta, F.; Martinez-Garcia, P. J.

2026-07-07 plant biology 10.64898/2026.07.06.736816 medRxiv

Top 0.1%

12.7%

Show abstract

In plant breeding and genetics, recent advances in high-throughput phenotyping are beginning to meet the growing demand for large-scale, high-quality phenotypic data that emerged after the development of next-generation sequencing technologies. Recent developments in phenomics have been incorporated into almond breeding programs, facilitating the large-scale acquisition of quantitative phenotypes and the dissection of the genetic architecture underlying morphological and quality-related traits. The implementation of a high-throughput phenotyping platform integrating RGB and hyperspectral imaging with genotyping using the 60K almond SNP array enabled the large-scale characterization of almond populations and the identification of 567 robust marker-trait associations across 66 traits. These analyses revealed two major genomic hotspots on chromosomes 2 and 5 associated with morphological and quality-related traits. These regions harbored biologically relevant candidate genes, including genes associated with OVATE family proteins, brassinosteroid signaling, protein ubiquitination, and acyl-CoA metabolism, as well as other regulators of organ growth, cell proliferation, hormone signaling, and seed development. Furthermore, a novel candidate gene encoding a COMT-like O-methyltransferase involved in lignin biosynthesis was identified and proposed to contribute to shell hardness, a major genetically controlled trait in almond. Together, these findings demonstrate the potential of integrating high-throughput phenomics and genomics to dissect complex traits, identify candidate genes, and accelerate genomics-informed breeding in almond.

17

Genomic Prediction Enables Same-Season Selection for Reduced Glycosidic Nitrile in Eastern U.S. Winter Barley

Perry, A. D.; Sabadin, F.; Brooks, W.; Brown-Guedira, G.; Uhlmann, H.; Bettenhausen, H.; Santantonio, N.

2026-06-06 plant biology 10.64898/2026.06.03.729884 medRxiv

Top 0.1%

12.7%

Show abstract

Glycosidic nitriles (GN) in barley are precursors to carcinogens formed during distillation, making GN reduction a critical breeding objective for malting and distilling industries. Measurement of GN is time-consuming. Grain must first be malted before GN can be quantified, and generally cannot be completed before selections must be made in a winter barley breeding program. Here, feasibility of same-season genomic selection against GN content was evaluated in elite Virginia Tech winter barley germplasm. In 2023, all 176 elite breeding lines screened for presence of GN were shown to be GN producers. A subset of 95 lines was then quantitatively measured for GN concentration to determine the genetic variability for the trait. Efficacy of genomic selection for GN was first assessed using a divergent selection approach on the remaining 81 predicted lines. The highest 16 and lowest 16 of the predicted lines were chosen for GN quantification. A significant phenotypic difference was found between the predicted high and low group means (0.8 ppm; P = 0.003). An additional 120 lines were quantified the following year to determine repeatability. GN exhibited moderate narrow-sense heritability (h2 = 0.42) and a high genetic correlation (r = 0.79) across years. Moderate predictive ability as was observed in cross-validation (range 0.38 - 0.61), and forward prediction using 2023 to predict 2024 (r = 0.39). A genome-wide scan did not identify any major-effect loci, suggesting GN content is polygenic, thus enabling same-season genomic selection to reduce GN content in this germplasm.

18

Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv

Top 0.1%

12.0%

Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

19

A genetic toolkit to reduce wheat immunogenicity and incidence of celiac disease

Rottersman, M. G.; Laudencia-Chingcuanco, D.; Zhang, W.; Guzman-Lopez, M. H.; Lin, J. W.; Zhang, J.; Caseys, C.; Burguener, G.; Kim, S.; Zhang, X.; Yunusbaev, U.; Akhunov, E.; Lee, J.-Y.; Dubcovsky, J.

2026-07-08 plant biology 10.64898/2026.06.23.734071 medRxiv

Top 0.1%

11.9%

Show abstract

Celiac disease (CeD) is an immune-mediated condition triggered by wheat gluten in genetically predisposed individuals. The immune reaction in people with CeD is driven by particular gluten amino acid sequences, or immunogenic epitopes. Some of these epitopes elicit strong immune responses in the majority of CeD patients and are designated as immunodominant epitopes. Previous research has shown correlations between the amount of immunogenic wheat epitopes consumed and the onset of CeD, suggesting that reducing wheat immunogenic epitopes may reduce CeD incidence at the population level. Gluten consists of gliadins and glutenins, with gliadins having the majority of the immunodominant epitopes and glutenins playing a major role in dough strength and breadmaking quality (BMQ). This study used radiation-induced deletions, chemical mutagenesis, and natural variation in wheat (Triticum aestivum) to generate genetic stocks with reduced immunogenic epitope content. Most lines were developed in the wheat cultivar Summit, for which we produced a full genome assembly and annotation. We used exome capture to characterize these deletions and identify prolamins located within and outside the deletions. We combined different deletions and developed molecular markers to facilitate their deployment. For chromosome arms 1BS and 1DS, we generated two alternative lines: one lacking immunogenic epitopes for the development of CeD-safe genetic stocks for research purposes, and another retaining selected glutenins for breeding commercial lines with reduced immunogenicity and adequate BMQ. By making these non-transgenic genetic stocks publicly available, we aim to accelerate the development of wheat varieties with reduced immunogenicity and, eventually, a fully CeD-safe wheat.

20

Identifying water stress response haplotypes in barley using latent environmental covariates

Aldiss, Z.; Brunner, S.; Heidariask, B.; Chenu, K.; Van Haeften, S.; Baraibar, S.; Ganesgalingam, D.; Moody, D.; Hickey, L.; Lam, Y.

2026-05-07 plant biology 10.64898/2026.05.04.722807 medRxiv

Top 0.1%

11.8%

Show abstract

PurposeGenotype-by-environment (G x E) interactions represent a major obstacle to increasing genetic gain in crop breeding, with the underlying physiological drivers often remaining obscured within conventional statistical models. This case study presents a novel framework that transforms the latent factors from Factor Analytic (FA) multi-environment trial (MET) models into heritable quantitative traits, enabling the genetic dissection of adaptive response patterns. MethodsA Factor Analytical Linear Mixed Model (FA-LMM) was fit to plot-level yield data for 1,036 barley genotypes across eight Australian trials. ResultsCorrelation of the factor loadings with APSIM-simulated environmental covariates demonstrated that the second latent factor FA2 was strongly correlated with the Water Stress Index (r = -0.83) during the critical flowering period, establishing water availability as the main biological axis of crossover Gx E. Genotypic scores for the derived traits, Overall Performance (OP) and Water Stress Response (WSR), were subjected to high-resolution haplotype-based mapping using local Genomic Estimated Breeding Values (GEBV). ConclusionThis analysis successfully identified major genomic regions that accounted for a substantial proportion of the additive genetic variance. Gene Ontology enrichment of candidate genes within the top haploblocks implicated fundamental pathways related to energy homeostasis, root development, and stress response, with notable candidates including FTsH11, BPS1, and TDP1. The distribution of favourable Haplotypes of Interest (HOI) in elite cultivars suggested a historical signature of inadvertent selection for these adaptive mechanisms. This framework provides an explicit bridge between statistical modelling and functional genomics, offering breeders actionable genetic targets for accelerated development of climate-resilient cereals.